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Abstract 

The automata arising from the well known conversion of regular expression to non 
deterministic automata have rather particular transition graphs. We refer to them as 
the Glushkov graphs, to honour his nice expression-to-automaton algorithmic short 
cut [10]. The Glushkov graphs have been characterized [6] in terms of simple graph 
theoretical properties and certain reduction rules. We show how to carry, under certain 
restrictions, this characterization over to the weighted Glushkov graphs. With the 
weights in a semiring K, they are defined as the transition Glushkov K-graphs of the 
Weighted Finite Automata (WFA) obtained by the generalized Glushkov construction 
[4] from the K-expressions. It works provided that the semiring K is factorial and the 
K-expressions are in the so called star normal form (SNF) of Briiggeman-Klein |2J. The 
restriction to the factorial semiring ensures to obtain algorithms. The restriction to 
the SNF would not be necessary if every K-expressions were equivalent to some with 
the same litteral length, as it is the case for the boolean semiring B but remains an 
open question for a general K. 

Keywords: Formal languages, weighted automata, K-expressions. 

1 Introduction 

The extension of boolean algorithms (over languages) to multiplicities (over series) has 
always been a central point in theoretical research. First, Schiitzenberger [T7] has given 
an equivalence between rational and recognizable series extending the classical result of 
Kleene [13]. Recent contributions have been done in this area, an overview of knowledge 
of these domains is presented by Sakarovitch in [16J. Many research works have focused 
on producing a small WFA. For example, Caron and Flouret have extended the Glushkov 
construction to WFAs [4 J . Champarnaud et al have designed a quadratic algorithm [7J for 
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computing the equation WFA of a K-expression. This equation WFA has been introduced 
by Lombardy and Sakarovitch as an extension of Antimirov's algorithm [J3] based on 
partial derivatives. 

Moreover, the Glushkov WFA of a K-expression with n occurrences of symbol (we say 
that its alphabetic width is equal to n) has only n + 1 states; the equation IK-automaton 
(that is a quotient of the Glushkov automaton) has at most n + 1 states. 

On the opposite, classical algorithms compute EC-expressions the size of which is expo- 
nential with respect to the number of states of the WFA. For example, let us cite the block 
decomposition algorithm proven in pQ. 

In this paper, we also address the problem of computing short K-expressions, and we 
focus on a specific kind of conversion based on Glushkov automata. Actually the par- 
ticularity of Glushkov automata is the following: any regular expression of width n can 
be turned into its Glushkov (n + l)-state automaton; if a (n + l)-state automaton is a 
Glushkov one, then it can be turned into an expression of width n. The latter property 
is based on the characterization of the family of Glushkov automata in terms of graph 
properties presented in [6]. These properties are stability, transversality and reducibility. 
Briiggemann-Klein defines regular expressions in Star Normal Form (SNF) [2]. These ex- 
pressions are characterized by underlying Glushkov automata where each edge is generated 
exactly one time. This definition is extended to multiplicities. The study of the SNF case 
would not be necessary if all K-expressions were equivalent to some in SNF with the same 
litteral length, as it is the case for the boolean semiring B. 

The aim of this paper is to extend the characterization of Glushkov automata to the 
multiplicity case in order to compute a K-expression of width n from a (n + l)-state WFA. 
This extension requires to restrict the work to factorial semirings as well as Star Normal 
Form K-expressions. 

We exhibit a procedure that, given a WFA Monla factorial semiring, outputs the 
following: either M is obtained by the Glushkov algorithm from a proper K-expression E 
in Star Normal Form and the procedure computes a K-expression F equivalent to E, or 
M is not obtained in that way and the procedure says no. 

The following section recalls fundamental notions concerning automata, expressions 
and Glushkov conversion for both boolean and multiplicity cases. An error in the paper by 
Caron and Ziadi [6] is pointed out and corrected. The section 3 is devoted to the reduction 
rules for acyclic K-graphs. Their efficiency is provided by the confluence of K-rules. The 
next section gives orbit properties for Glushkov IK- graphs. The section 5 presents the 
algorithms computing a IK-expression from a Glushkov K-graph and details an example. 
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2 Definitions 



2.1 Classical notions 

Let £ be a finite set of letters (alphabet), e the empty word and the empty set. Let (K, 
©, <g>) be a zero-divisor free semiring where is the neutral element of (IK,©) and 1 the 
one of (K, <8>). The semiring K is said to be zero-divisor free [llj if 7^ 1 and if Vx, y 6 K, 
x(g)y = 0=>:r = 0ory = 0. 

A formal series [1] is a mapping S from S* into K usually denoted by S = S(w)w 

where S(w) £ IK is the coefficient of w in S. The support of 5 is the language Supp(S) = 
{w G E*|5H / 0}. 

In [TJ], Lombardy and Sakarovitch explain in details the computation of K- expressions. 
We have followed their model of grammar. Our constant symbols are e the empty word 
and 0. Binary rational operations are still + and •, the unary ones are Kleene closure 
*, positive closure + and for every k G IK, the multiplication to the left or to the right 
of an expression x. For an easier reading, we will write kE (respectively Ek) for k x E 
(respectively E x k). Notice that our definition of EC-expressions, which set is denoted E^, 
introduces the operator of positive closure. This operator preserves rationality with the 
same conditions (see below) that the Kleene closure's one. 

IK-expressions are then given by the following grammar: 

E^aeY>\$\e\(E + E) \ (E- E) \ (E*) \ (E + ) \ (kE), keK\ (Ek), k £ K 

Notice that parenthesis will be omitted when not necessary. The expressions E + and 
E* are called closure expressions. If a series S is represented by a IK-expression E, then 

we denote by c(S) (or c(E)) the coefficient of the empty word of S. A IK-expression E is 

+00 

valid [16] if for each closure subexpression F* and F + of E, ^ c(F) G IK. 

i=0 

A IK-expression E is proper if for each closure subexpression F* and F + of E, c(F) = 0. 

We denote by <?k the set of proper IK-expressions. Rational series can then be defined 
as formal series expressed by proper IK- expressions. For E in £k, Supp(E) is the support 
of the rational series defined by E. 

The length of a IK-expression E, denoted by \\E\\, is the number of occurences of letters 
and of e appearing in E. By opposition, the litteral length, denoted by \E\ is the number 
of occurences of letters in E. For example, the expression E = (a + 3)(b + 2) + (—1) as a 
length of 5 and a litteral length of 2. 

A weighted finite automaton ( WFA) on a zero-divisor free semiring IK over an alphabet 
£ [8] is a 5-tuple (S, Q, I, F, 5) where Q is a finite set of states and the sets /, F and 6 are 
mappings / : Q — > IK (input weights), F : Q — > IK (output weights), and 5: QxT,xQ—>K 
(transition weights). The set of WFAs on IK is denoted by A4k- A WFA is homogeneous if 
all vertices reaching a same state are labeled by the same letter. 
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A "K-graph is a graph G = (X, U) labeled with coefficients in K where X is the set of 
vertices and U : X x X — > K is the function that associates each edge with its label in 
K. When there is no edge from p to q, we have U(p,q) = 0. In case K = B, the boolean 
semiring, E% is the set of regular expressions and, as the only element of K \ is 1, we 
omit the use of coefficient and of the external product (la = al = a). For a rational series 
S represented by E G Eb, Supp(E) is usually called the language of E, denoted by L(E) 
and S = Supp(S) = L(E). A boolean automaton (automaton in the sequel) M over an 
alphabet E is usually defined [SI [12] as a 5-tuple (E,Q, I, F,5) where Q is a finite set of 
states, ICQ the set of initial states, F C Q the set of final states, and 5 Q Q x T, x Q the 
set of edges. We denote by L(M) the language recognized by the automaton M. A graph 
G = (X, U) is a IB-graph for which labels of edges are not written. 

2.2 Extended Glushkov construction 

An algorithm given by Glushkov [10J for computing an automaton with n + 1 states from 
a regular expression of litteral length n has been extended to semirings IK by the authors 
[1]. Informally, the principle is to associate exactly one state in the computed automaton 
to each occurrence of letters in the expression. Then, we link by a transition two states of 
the automaton if the two occurences of the corresponding letters in the expression can be 
read successively. 

In order to recall the extended Glushkov construction, we have to first define the ordered 
pairs and the supported operations. An ordered pair (I, i) consists of a coefficient I £ K\{0} 
and a position i 6 N. We also define the functions Xu : H — > K such that is 
equal to T if i G H and otherwise. We define P : 2 K \{ 5 > xN -> 2 N the function that 
extracts positions from a set of ordered pairs as follows: for Y a set of ordered pairs, 
P(Y) = {i J ,l<j<\Y\\3(l j ,i 3 )eY}._ 

The function Coeff Y '■ P(Y) — > K \ {0} extracts the coefficient associated to a position 
i as follows: Coeff Y {i) = I for (I, i) € Y. 

Let Y, Z C K\ {0} x N be two sets of ordered pairs. We define the product of k € K\ 
and Y by k-Y = {(k®l,i) \ G Y} and Y-k = {{l®k,i) \ e Y}, 0- Y = F-O = 0. 
We define the operation tfcl by Y W Z = \ either G Y and i g" P(Z) or G 

Z and i P(Y) or (l s , i) G Y, (l t , i) G Z for some l s ,l t G K with I = l s @l t ^ 0}. 

As in the original Glushkov construction [9[[T5], and in order to specify their position 
in the expression, letters are subscripted following the order of reading. The resulting ex- 
pression is denoted E, defined over the alphabet of indexed symbols E, each one appearing 
at most once in E. The set of indices thus obtained is called positions and denoted by 
Pos(E). For example, starting from E = (2a+6)*- a- 36, one obtains the indexed expression 
E = {2a\ + 62)*- 03- 364, E = {01,62,03,64} and Pos(E) = {1,2,3,4}. Four functions are 
defined in order to compute a WFA which needs not be deterministic. First(E) represents 
the set of initial positions of words of Supp(E) associated with their input weight, Last(E) 
represents the set of final positions of words of Supp(E) associated to their output weight 
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and Follow (E,i) is the set of positions of words of Supp(E) which immediately follows 
position i in the expression E, associated to their transition weight. In the boolean case, 
these sets are subsets of Pos(E). The Null(E) set represents the coefficient of the empty 
word. The way to compute these sets is completely formalized in table [TJ 
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Table 1: Extended Glushkov functions 



These functions allow us to define the WFA M = (£, Q, {s/}, F, 5) where 

1. E is the indexed alphabet, 

2. s/ is the single initial state with no incoming edge with 1 as input weight, 

3. Q = Pos(E) U {s/} 

4. F : Q -> K such that = ( ™W , ^ = */ 

I Coeff Last{E) {i) otherwise 

5. i:QxSxQ->K such that a^, h) = for every /i 7^ j, whereas 
J( . fl . .x = f Coeff First(E) (j) i = sj 

J ' I Coeff Follow{Ei) (j) i^si 

The Glushkov WFA M = (E, Q, {s/}, F, 5) of F is computed from M by replacing the 
indexed letters on edges by the corresponding letters in the expression E. We will denote 
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j4k : £k - ► -M-K the application such that A^(E) is the Glushkov WFA obtained from E 
by this algorithm proved in 

In order to compute a K- graph from an homogeneous WFA M, we have to add a new 
vertex {<&}■ Then U, the set of edges, is obtained from transitions of M by removing labels 
and adding directed edges from every final state to {$}. We label edges to <3? with output 
weights of final states. The labels of the edges U(i,p) for i <G Q, / 0, p G Q U {$} are 
(^-multiplied by the input value of the initial state i of M. 

In case M is a Glushkov WFA of a K-expression E, the K-graph obtained from M is 
called Glushkov IfC-graph of E and is denoted by Gk(E). 

2.3 Normal forms and casting operation 
Star normal form and epsilon normal form 

For the boolean case, Bruggemann-Klein defines regular expressions in Star Normal Form 
(SNF) [2] as expressions E for which, for each position i of Pos(E), when computing the 
Follow(E, i) function, the unions of sets are disjoint. This definition is given only for usual 
operators ,+, •, *. We can extend this definition to the positive closure, + as follows: 

Definition 1 A ^-expression E is in SNF if, for each closure B- subexpression H* or H + , 
the SNF conditions (1) Follow(H, Last(H)) n First(H) = and (2) e L(H) hold. 

Then, the properties of the star normal form (defined with the positive closure) are pre- 
served. 

In the same paper, Bruggemann-Klein defines also the epsilon normal form for the 
boolean case. We extend this epsilon normal form to the positive closure operator. 

Definition 2 The epsilon normal form for a ^-expression E is defined by induction in the 
following way: 

• [E = e or E = a] E is in epsilon normal form. 

• [E = F + G] E is in epsilon normal form if F and G are in epsilon normal form and 
if e L(F)C\L{G). 

• [E = FG] E is in epsilon normal form if F and G are in epsilon normal form. 

• [E = F + or E = F*] E is in epsilon normal form if F is in epsilon normal form and 
e^L(F). 

Theorem 3 ( [2 J ) For each regular expression E, there exists a regular expression E' such 
that 

1. A M (E) = A M (E'), 
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2. E* is in SNF 



3. E* can be computed from E in linear time. 

Briiggemann-Klein has given every step for the computation of E* , This computation re- 
mains. We just have to add for H + the same rules as for H*. Main steps of the proof are 
similar. 

We extend the star normal form to multiplicities in this way. Let £ be a K-expression. 
For every subexpression H* or H + in E, for each x in P(Last(H)), 

P{Follow(H, x)) n P(First(H)) = 

We do not have to consider the case of the empty word because H + and H* are proper 
K-expressions if c(H) = 0. 

As an example, let H = 2(2^+ (362 ) + an d E = (H)*. We can see that the expression E = 
(2a^ + (36 2 )+)* is not in SNF, because 2 G P(Last(H)), 2 G P(Follow(H,2))nP(First(H)). 

The casting operation ~ 

We have to define the casting ~: A4k — > Mm- This is similar to the way in which Buchs- 
baum et al. [3] define the topology of a graph. A WFA M = (£, Q, I, F, 5) is casted into 
an automaton M = (X, Q, I, F,J)) in the following way: /, F C Q, I = {q G Q \ I(q) / 0}, 
F = {q G Q | F(q) ^ 0} and 5 = {(p,a,q) \ p,q G Q, a G S and S((p,a,q)) ^ 0}. The 
casting operation can be extended to K-expressions ~: £k — > E%. The regular expression E 
is obtained from E by replacing each k G K\0 by 1. The ~ operation on E is an embedding 
of US-expressions into regular ones. Nevertheless, the Glushkov IB-graph computed from a 
IK-expression E may be different whether the Glushkov construction is applied first or the 
casting operation ~. This is due to properties of K-expressions. For example, let K = Q, 
E = 2a* + (—2)6* (E is not in epsilon normal form). We then have E = a* + b* . We can 
notice that A^(E) ^ A^(E) (E does not recognize e but E does). 

Lemma 4 Let E be a K- expression. If E is in SNF and in epsilon normal form, then 

Ak{E)=A m (E). 

Proof We have to show that the automaton obtained by the Glushkov construction for 
an expression E in £k has the same edges as the Glushkov automaton for E. First, we 
have Pos(E) = Pos(E), as E is obtained from E only by deleting coefficients. Let us 
show that First(E) = P(First(E)) (states reached from the initial state) by induction 
on the length of E. If E = e, E = e, First(E) = = First(E) = P(First(E)). Jf 
E = a G E,E = ai then E = E, First(E) = {(1,1)}, P{First(E)) = {1} = First(E). 
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Let F satisfy the hypothesis, and E = kF,k G K \ 0. In this case, E = F, P(First(E)) = 
P{k.First{F)) = P(First(F)) = First(F) = First(E). If E = Fk, k G K, E = F, 
P{First(E)) = P{First(F)) = First(F) = First(E). 

If E = F + H, and if F and H satisfy the induction hypothesis, and as the coefficient 
of the empty word is for one of the two subexpression F or H (epsilon normal form), we 
have E = F + H, First(F + H) = First(F) U First(H) = P(First(F)) U P(First(H)) 
which is equal to P{First{F + H)) by induction. We obtain the same result concerning 
F ■ H, F + and F* . ^ 

The equality Last(E) = P(Last(E)) is obtained similarly. 

The last function used to compute the Glushkov automaton is the Follow function. 
Let £ be a K-expression and i G Pos(E). If E = e, E = e, Follow(E,i) = = 
Follow(E,i) = P(Follow(E,i)). If E = a G K, E = E, Follow(E,i) = 0. Let F sat- 
isfy Follow(F,i) = P(Follow(F,i)) for all i Gj 3 os(F). If E is kF or Fk, k G K \ 0, 
P{Follow(E,i)) = P(Follow(F,i)) = Follow(F,i) by hypothesis. If F and H satisfy 
the induction hypothesis, and if E = F + H, (and i G Pos(F) without loss of general- 
ity), Follow{F + H,i) = Follow(F,i), then P(Follow(F,i)) = Follow(F,i). We obtain 
similar results for E = F.H as there is no intersection between positions of F and H. 
Concerning the star operation, let E = F*, with Follow(F,i) = P(Follow(F,i)) for all 
i G Pos(F). Then, P(Follow(F* ,i)) = P(F oil ow(F,i)uCoeS Last{F) (i)- First (F)). But by 
definition, as F is in SNF, we know that Follow(F, i)nFirst(F) = 0, so P(Follow(F* = 
Follow(F* In fact, it means that if there exists a couple (a, j) G Follow(F,i), there 
cannot exist (fl,j) G First(F). Otherwise, the expression would not be in SNF, and it 
would be possible that (3 = a, which would make j Pos(F*) and imply a deletion of an 
edge. A same reasonning can be done for the positive closure operator. 

Hence, the casting operation ~ and the Glushkov construction commute for the com- 
position operation if we do not consider the empty word. 



2.4 Characterization of Glushkov automata in the boolean case 

The aim of the paper by Caron and Ziadi [6] is to know how boolean Glushkov graphs can 
be characterized. We recall here the definitions which allow us to give the main theorem of 
their paper. These notions will be necessary to extend this characterization to Glushkov 
K- graphs. 

A hammock is a graph G = (X,U) without a loop if \X\ = 1, otherwise it has two 
distinct vertices i and t such that, for any vertex x of A, (1) there exists a path from i to 
t going through x, (2) there is no non-trivial path from t to x nor from x to i. Notice that 
every hammock with at least two vertices has a unique root (the vertex i) and anti-root 
(the vertex t). 

Let G = (A, U) be a hammock. We define O = (Xq, Uo) Q G as an orbit of G if and 
only if for all x and x' in Xq there exists a non-trivial path from x to x'. The orbit O is 
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maximal if, for each vertex x € Xq and for each vertex x' £ X \ Xq, there do not exist 
both a path from x to x' and a path from x' to x. Equivalently, O C G is a maximal orbit 
of G if and only if it is a strongly connected component with at least one edge. 

Informally, in a Glushkov graph obtained from a regular expression E, the set of vertices 
of a maximal orbit corresponds exactly to the set of positions of a closure subexpression 
of E. 

The set of direct successors (respectively direct predecessors) of x € X is denoted by 
Q + (x) (respectively Q~{x)). Let n x = \Q~(x)\ and m x = \Q + (x)\. For an orbit O C G, 
(9+(x) denotes Q + {x) n(X\0) and 0~(a?) denotes the set Q~(x) n(X\0). In other 
words, + {x) is the set of vertices which are directly reached from x and which are not 
in O. By extension, + = \J xe0 + (x) and 0~ = {J xe0 °~( x )- The sets In (°) = i x € 
A | 0~(a:) / 0} and 0ut(O) = {x € X a \ + (x) / 0} denote the input and the output 
of the orbit O. As G is a hammock, Itj(O) / and 0ut(O) ^ 0. An orbit O is staWe if 
Out(O) x Jn(O) C U. An orbit O is transverse if, for all x,y € Out(O), + (x) = + (y) 
and, for all x,y G ln(0), 0~(x) = 0~{y). 

An orbit O is strongly stable (respectively strongly transverse) if it is stable (respectively 
transverse) and if after deleting the edges in Out(0) x ln(0) (1) there does not exist any 
suborbit O 1 C O or (2) every maximal suborbit of O is strongly stable (respectively strongly 
transverse). The hammock G is stronly stable (respectively strongly transverse) if (1) it 
has no orbit or (2) every maximal orbit O C G is strongly stable (respectively strongly 
transverse). 

If G is strongly stable, then we call the graph without orbit of G, denoted by SO(G), 
the acyclic directed graph obtained by recursively deleting, for every maximal orbit O of 
G, the edges in 0ut(O) x ln(0). The graph SO(G) is then reducible if it can be reduced 
to one vertex by iterated applications of the three following rules: 

• Rule R\\ If x and y are vertices such that Q~(y) = 
delete y and define Q + (x) := Q + (y)- 

• Rule i?2 : If x and y are vertices such that Q~{x) = 
then delete y and any edge connected to y. 

• Rule i?3: If x is a vertex such that for all y € Q~(x), 
edges in Q~(x) x Q + (x). 

Theorem 5 (|6j) G = (X, U) is a Glushkov graph if and only if the three following con- 
ditions are satisfied: 

• G is a hammock. 

• Each maximal orbit in G is strongly stable and strongly transverse. 

• The graph without orbit SO(G) is reducible. 



{x} and Q + (x) = {y}, then 
Q-(y) and Q+(x) = Q+(y), 
Q + (x) C Q + (y), then delete 
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2.5 The problem of reduction rules 



An erroneous statement in the paper by Caron and Ziadi 

In [6], the definition of the R3 rules is wrong in some cases. Indeed, if we consider the 
regular expression E = (x± + e)(x2 + e) + {x% + e)(x4 + e), the graph obtained from the 
Glushkov algorithm is as follows 



Let us now try to reduce this graph with the reduction rules as they are defined in 
[6]. We can see that the sequel of applicable rules is R3, R3 and R%. We can notice that 
there is a multiple choice for the application of the first R3 rule, but after having chosen 
the vertex on which we will apply this first rule, the sequel of rules leads to a single graph 
(exept with the numerotation of vertices). 



We can see that the graph obtained is no more reducible. This problem is a consequence 
of the multiple computation of the edge (0,$). In fact, this problem is solved when each 
edge of the acyclic Glushkov graph is computed only once. It is the case when E is in 
epsilon normal form. 

A new i?3 rule for the boolean case 

Let G = (X, U) be an acyclic graph. The rule R3 is as follows: 

• If x G X is a vertex such that for all y G Q~(x), Q + {x) C Q + (y), then delete the 
edge (q~ , q + ) G Q~(x) x Q + (x) if there does not exist a vertex z € X \ {x} such that 
the following conditions are true: 





Figure 1: Application of R3 on 1, R3 on 2 and Ri on 1 and 2. 



there is neither a path from x to z nor a path from z to x 
q~ G Q~(z) and q + G Q + (z), 
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- \Q~(z)\ x \Q+(z)\ + 1. 



The new rule R3 check whether conditions of the old -R3 rules are verified and moreover 
deletes an edge only if it does not correspond to the e of more than one subexpression. 
The validity of this rule is shown in Proposition [TU1 

3 Acyclic Glushkov WFA properties 

The definitions of section 12.41 related to graphs are extended to K-graphs by considering 
that edges labeled do not exist. 

Let us consider M a WFA without orbit. Our aim here is to give conditions on weights 
in order to check whether M is a Glushkov WFA. Relying on the boolean characterization, 
we can deduce that M is homogeneous and that the Glushkov graph of M is reducible. 

3.1 K-rules 

IEC-rules can be seen as an extension of reduction rules. Each rule is divided into two 
parts: a graphic condition on edges, and a numerical condition (exept for the K-Ri-rule) 
on coefficients. The following definitions allow us to give numerical constraints for the 
application of K-rules. 

Let G = (X, U) be a K.-graph and let x,y G X. Let us now define the set of beginnings 
of the set Q~(x) as B(Q~(x)) C Q~(x). A vertex x~ is in B(Q~(x)) if for all q~ in 
Q~(x) there is not a non trivial path from q~ to x~ . In the same way, we define the set of 
terminations of Q + (x) as T(Q + (x)) C Q + (x). A vertex x + is in T{Q + (x)) if for all q + in 
Q + {x) there is not a non trivial path from x + to q + . 

We say that x and y are backward equivalent if Q~(x) = Q~(y) and there exist l x , l y G K 
such that for every q~ £ Q~(x), there exists a q - G K such that U(q~,x) = a q - (8> l x and 
U(q~,y) = OL q - ®l y - Similarly, we say that x and y are forward equivalent if Q + (x) = Q + (y) 
and there exist r x ,r y G K such that for every q + G Q + (x), there exists j3 q + G K such that 
U(x, q + ) = r x (g) j3 q + and U (y, q + ) = r y . Moreover, if x and y are both backward and 
forward equivalent, then we say that x and y are bidirectionally equivalent. 

In the same way, we say that x is e-equivalent if for all (q~,q + ) G Q~{x) x the 
edge (q~ , g + ) exists and if there exist fc, Z, r G K such that for every q~ G there exists 

a q ~ G K and for every q + G Q + (x) there exist /5„+ G IK, such that U(q~,x) = a q - (8) Z, 
[/(a;, g + ) = r (£> (3 q + and f7 q + ) = a q - ®k<g> (3 q +. 

Similarly, x is quasi-e- equivalent if 

• B(Q-(x)) ± Q~(x) or T(Q+(x)) ± Q + {x), and 

• for all (q~,q + ) G Q~(x) x Q+(x) \ B(Q-(x)) x T(Q+(x)), the edge exists, 
and 
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• there exist k, l,r G IK such that for every q~ G Q~{x) there exist a q - G K and for 
every q + G Q + (x), there exist (3 q + G K such that U(q~,x) = a q - ® I, U(x,q + ) = 
r <g> (3 q + , and 

• if q~ # B(Q-(x)) or q+ G" T(Q+(x)) 

- then U(q~ ,q + ) = a q - <g> fe ® 

— else there exists 7 6 K such that U(q~,q + ) = 7 ® a g - ® /c ® (Notice that 
if the edge from g~ to g + does not exist in the automaton, then U{q~ , g + ) = 
and it is possible to have 7 © a q - <8> <8> = 0). 

In order to clarify our purpose, we have distinguished the case where (q~ , q + ) are superposi- 
tions of edges (quasi-e-equivalence of x) to the case where they are not (e-equivalence of x). 

Rule KJ?i: If x and y are vertices such that Q~(y) = {x} and Q + (x) = {y}, then delete 
y and define Q + (x) <— Q + (y)- 




Figure 2: Ki?i reduction rule 



Rule M.R2- If x and y are bidirectionally equivalent, with l x ,ly,r x ,r y G K are the constants 
satisfying such a definition, then 

• delete y and any edge connected to y 

• for every q~ G and q + G set U'(q~,x) = a q - and U'(x,q + ) = q + 
where a q - and (3 q + are defined as in the bidirectional equivalence. 



Q"(x) - 




Figure 3: Ki?2 reduction rule 

Rule IK.R3: If x is e-equivalent or x is quasi-e-equivalent with l,r, A;, 7 G IK the constants 
satisfying such a definition, then 
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• if a; is e-equivalent 

— then delete every (q~,q + ) G Q~(x) x Q + (x), 

- else delete every (q~,q + ) € Q~(x) x Q+(x)\B(Q-(x)) x T(Q+(x)). 

• for every q~ £ Q~{x) and q + G Q + (x) set U'(q~,x) = a q - and [/'(a;, g + ) = (3 q + 
where a q - and are defined as in the e-equivalence or quasi-e-equivalence. 

• If x is quasi-e-equivalent then compute the new edges from B{Q~{x)) x T(Q + (x)) 
labeled 7. 



Q-(x) 




Figure 4: Ki?3 reduction when x is e-equivalent 



t * 

,B(Q'W) ' 




Figure 5: KR3 reduction when x is quasi-e-equivalent 
3.2 Confluence for K-rules 

In order to have an algorithm checking whether a K- graph is a Glushkov K-graph, we 
have to know (1) if it is decidable to apply a K-rule on some vertices and (2) if the 
application of K-rules ends. In order to ensure these characteristics, we will specify some 
sufficient properties on the semiring K. Let us define K as a field or as a factorial semiring. 
A factorial semiring IK is a zero-divisor free semiring for which every non-zero, non-unit 
element a; of IK can be written as a product of irreducible elements of K x = pi ■ ■ -p n , and 
this representation is unique apart from the order of the irreducible elements. This notion 
is a slight adaptation of the factorial ring notion. 

It is clear that, if IK is a field, the application of IK-rules is decidable. Conditions of ap- 
plication of IK-rules are sufficient to define an algorithm. In the case of a factorial semiring, 
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as the decomposition is unique, a gcd is defined_| and it gives us a procedure allowing us 
to apply one rule (Ki?2 or K-R3) on a K-graph if it is possible. It ensures the decidability 
of IC-rules application for factorial semirings. For both cases (field and factorial semiring), 
we prove that IK-rules are confluent. It ensures the ending of the algorithm allowing us to 
know whether a K-graph is a Glushkov one. 

We explicit algorithms in order to apply the M.R2 and Ki?3 rules. Algorithm [2] tests 
whether the Ki?2-rule graphical and numerical conditions for two states are verified. If 
so, it returns the partially reduced K-graph. Algorithm Q] is divided into three func- 
tions. The first one check whether the Ki?3-graphical conditions are checked on a state 
x (KR3 Graphic alEquivalenceConditionsChecking) and returns the e or quasi-e- 
equivalence type of x. Then, depending on the type of x, the numerical conditions for e 
or quasi-e-equivalence are verified (function EquivalenceChecking). Finally a partially 
reduced K-graph is obtained using GraphComputing function. 

Algorithm 1 Application of the Ki?3 rule for a state 

Ki? 3 -APPLICATION(x, G) 

t> Input: One state x of a IK-graph G = (X, U) 
t> Output: The newly computed graph G 

1 Begin 

2 if Ki? 3 GRAPHiCALEQurvALENCECONDiTlONsCHECKlNG(x, G, type) = False then 

3 return False 

> If type is equal to e (resp. quasi-e) lines labeled {quasi-e } (resp. { e }) 
o of the functions below are deleted 

4 if EquivalenceChecking(x, G, [a], [/3], k, [7]) = False then 

5 return False 

6 GraphComputing(x, G, [a], [/?], k, [7]) 

7 return True 

8 End 



1 In case K is not commutative, left gcd and right gcd are denned. 
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Algorithm 2 Application of the KR2 rule for two states 

KR 2 - Application (2, y, G) 

> Input: Two states x and y of a US-graph G = (X, U) 

> Output: The newly computed graph G 

1 Begin 

2 if Q~(x) ± Q~(y) or Q+(x) ± Q+{y) then 

3 return False 

4 q{ <— a vertex of Q~(x) 

5 gcd r (x) <- U(qi,x) 

6 gcd r (y) «- U(q^,y) 

7 for each g € <5 (^) do 

8 gcd r (x) <— RIGHT GCD([/((/",x),gcd r (x)) 
3 gcd r (y) «- right GCD([/(<T,y),gcd r (y)) 

10 for each g € Q _ (x) do 

compute a q - such that U(q~,x) = a q - ® gcd r (x) 

12 if a ? - ® gcd r {y) ^ U(q~,y) then 

i5 return False 

14 ql <— a vertex of C} + (x) 

15 gcdj(x) <- U(x,qf) 

16 gcd,(y) ^U(y,q+) 

17 for each g + € C} + (x) do 

1<¥ gcd ; (x) <— LEFT GCD(£/(x, <?+),gcdj(x)) 

iS gcdj(y) «- LEFT GCD([/(y,g+),gcd i (y)) 

,20 for each q + € C} + (x) do 

21 compute (5 q + such that U(x,q + ) = gcd ; (x) <2> f3 q + 

22 if gcd, (y) ®f3 q+ ^U(y,q + ) then 

23 return False 

24 delete y and any edge connected to y 

25 for each q~ € Q~{x) do 

26 U(q~,x)^a q - 

27 for each q + € C} + (x) do 

28 U{q + ,x)^f3 q + 

29 return True 

30 End 
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GraphComputing(ie, G, [a], [0\,k, [7]) 

> Input: One state x of a K-graph G — (X, U) 

> a e K.\Q~( X ^, (3 G K.\Q + ( x )\,k G K 

> Input: and 7 G KI s (<3-(x))|x|t(q+(x))| > quasi- e 

> Output: The newly computed graph G 

1 Begin 

2 for each q~ G Q~ (x) do 

5 U(q~, x) <— 

^ for each g+ G do 

5 U(x,q+)^f3 q+ _ 

6 delete any edge (q~,q + ) G Q~(x) x Q + (x) > e 

7 delete any edge (<T,<? + ) G Q~(x) x Q + (x) \ B(Q~{x)) x T(Q+(x)) > quasi- e 
5 for each (q~,q + ) G B(Q~(x)) x T(<5 + (a;)) do > quasi- e 
P U(q~, q + ) <— 7(<Z~, 9 + ) > quasi- e 

i0 End 



IK i?3 Graphic alEquivalenceConditionsChecking(x, G, type) 

> Input: One state a; of a K-graph G = (X, U) 

> Output: type G {e-cqui valence, quasi-e-equivalence} 
1 Begin 



2 compute B{Q~{x)) and T(Q+(x)) 

3 if B{Q-{x)) = Q-(x) and T(Q+(x)) = Q+(x) then 

4 for each q~ G do 

5 for each q + G Q + (x) do 

6 if U(q~,q+) = then 

7 return False 

8 type <— e 

5 return True 

10 else for each q~ G Q~(x) do 

11 for each q + G Q + {x) do 

^ if (<T,<7+) G Q-(x) x Q+(ar) \ B(Q-(x)) x T(Q+(z)) 

and C/(g", q+)) = then 

i<3 return False 

14 type <— quasi-e 

15 return True 



16 End 
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EquivalenceChecking(:e, G, [a], [/?], fc, [7]) 

> Input: One state x of a K-graph G — (X, U) 

> Output: a G Kl«"( x )l, /3 e K^MI,* G K 

> Output: and 7 e Kl s W"( a: ))l >< l T W + ( x ))l > quasi- e 
1 Begin 



2 q 1 <— a vertex of Q (x) 

5 gcd r <- J7(gf ,a;) 

^ for each q~ G Q~(x) do 

5 gcd r <— right GCD([/(g~,x),gcd r ) 

5 for each G Q~(x) do 

7 compute a g - such that U(q~,x) = a q - (g> gcd r 

5 <— a vertex of Q + (x) 

S gcd ; <-U{x,qt) 

10 for each q + G Q + (x) do 

11 gcd ; <— LEFT GCD(gcd ; , f7(x, g+)) 
1,2 for each g+ G Q + (x) do 

15 compute (3 q + such that [/(x, g + ) = gcd; ® /3 g + 

1^ {Qi>Qi) <— a couple of vertices of Q~(x) x Q + (x) > e 

15 (<7j~, <— a couple of vertices of > quasi-e 

Q-(x) x Q+(x) \ B(Q-{x)) x T(Q+(x)) 

16 Find fci such that 

U(q^,q+) =a q - ® fci ® /3 9 + 

17 if fci does not exist then 
1$ return False 

19 for each (q~,q + ) G Q~(x) x Q+(x) do 

,20 if (<T,« + ) g B(Q-(x)) x T(Q+(or)) then > quasi-e 

,21 Find fc such that 

U(q~,q + ) = a q - ®k®(5 q + 

22 if fc does not exist then 

23 return False 

24 elif k 7^ fci then 
£5 return False 

26 for each G B(Q~(x)) x T(Q+(x)) do > quasi-e 

£7 Find 7(9", g + ) such that > quasi-e 

tf(g- , q+) = a q - ® fc Cg) /?,+ © 7(9" , 9 + ) 
if 7(<z _ ,9 + ) does not exist then > quasi-e 

29 return False > quasi-e 

30 return True 



31 End 



Definition 6 (confluence) Let G be a 'K-graph and Iq the acyclic graph having only one 
vertex. Let R\ be a sequence of K-rules such that 

Ri 
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M,-rules are confluent if for all M,-graph G2 such that there exists R2 a sequence of 

"K-rules with G — ► G2 then there exists R' 2 a sequence of "K-rules such that 
R2 

G 2 ^l G 

For the following, IK is a field or a factorial semiring. 
Proposition 7 The "K-rules are confluent. 

Proof In order to prove this result, we will show that if there exist two applicable K-rules 
reducing a Glushkov K-graph, then the order of application does not modify the resulting 
K-graph. 

Let us denote by r x ,y(G) the application of a KRi, KR2 or KR3 rule on the vertices x 
and y with y = for a K.R3 rule. 

Let G = (X, U) be a Glushkov K-graph and let r x>y and r z j be two applicable K-rules 
on G such that {x, y} n {z, t} = and no edge can be deleted by both rules. Necessarily 
we have r x>y (r Z)t (G)) = r Z)t (r x>y (G)). 

Suppose now that {x, y} n {z, t} 7^ or one edge is deleted by both rules. We have to 
consider several cases depending on the rule r x>y . 

r xy is a rule In this case r z ^ can not delete the edge from x to y and r z t is necessarily a Ki2i-rule 
with {x, y} n {z, t} 7^ 0. If y = z, as the coefficient does not act on the reduction 
rule, r X} y(r Z}t (G)) = r z>t (r x>y {G)) 

r X)V is a Ki?2 rule Consider that r z t is a Ki?2 r ule with y = z. Using the notations of the M.R2 rule, there 
exist a q -, (3 q +, l x ,l y ,r x ,r y such that U(q~,x) = a g -l x , U(q~,y) = a q -l y , U(x,q + ) = 
r x (3 q + and U(y,q + ) = r y j3 q + with q~ <E Q~(x), q + G Q + (x), and l x = gcd r (x), 
l y = gcd r (y) (r x = gcd;(x), r y = gcd;(y)). By hypothesis, a KR2 rule can also be 
applied on the vertices y and t. There also exists c/_, f3' + , l' x ,l' t ,r' x ,r' t such that 
<V = °H r l' x , P q+ = r' x P' q+ , U(q~,t) = a' q J' t , U(t,q+) = r' t (3' q+ (Q-(x) = Q~(t) and 
Q + (x) = Q + (t)). By construction (Algorithm [2]) of gcd r (x), the left gcd of all a q - is 
1. Then, whatever the order of application of Ki?2 rules, the same decomposition of 
edges values is obtained. Symetrically a same reasoning is applied for the right part. 

Consider now that r Z) t = r z is a Ki?3 rule. Neither edges from x or y nor edges to 
x or y can be deleted by r z $. Then z = x or z = y. Let z = y. If we successively 
apply r X)V and r y $ or r y $ and r X|2 / on G, we obtain the same K-graph following the 
same method (function EquivalenceChecking) as the previous case. If we choose 
z = x, we have also the same K-graph (commutativity property of the sum operator). 

r x is a KR3 rule The only case to consider now is r Z)t = r z § a KR3 rule. Suppose that r z deletes an 
edge also deleted by r x (with x 7^ z). Let (q~,q + ) be this edge. 
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Using the notations of the Ki?3 rule, there exist a q -, (3 q +, l,r such that U(q~ ,x) = 
a q -l, U(x,q + ) = rj3 q +, U(q~,q + ) = a q ~k(3 q + ©7 with q~ <G Q~(x), q + G Q + {x) and 
/ = gcd r (x), r = gcd ; (x). There also exists a' , l',r' such that U(q~,z) = a' I', 
U(z,q + ) = r'P q+ , U(q-,q + ) = a' q _ k'f3' q+ © 7' with V = gcd r (z), r> = gcd^z). By 
construction, (function EquivalenceChecking), the computation of I and V (r and 
r') are independant. A same reasoning is applied for the right part. Then we can 
choose 7" such that 7 = a' q _k'(3' + © 7" and 7' = a q -k(3 q + © 7". So U(q~,q + ) = 
a q -kf3 q + © a' q .k'[3' q+ ©7". It is easy to see that r x $(r z $(G)) = r z $(r x $(G)). 



3.3 K-reducibility 

Definition 8 ^4 "K-graph G = (X, U) is said to be IK-reducible if it has no orbit and if it 
can be reduced to one vertex by iterated applications of any of the three rules Ki?i, M.R2, 
Ki?3 described below. 

Proposition 1101 shows the existence of a sequel of K-rules leading to the complete reduc- 
tion of Glushkov K-graphs. However, the existence of an algorithm allowing us to obtain 
this sequel of IK-rules depends on the semiring K. 

In order to show the K-reducibility property of a Glushkov IC-graph G, we check 
(Lemma [9]) that every sequence 1Z of K-rules leading to the K-reduction of G contains 
necessarily two Ki2i rules which will be denoted by r a and r # . 

Lemma 9 Let G = (X, U) be a ^.-reducible Glushkov M,-graph without orbit with \X\ > 3, 
and let 1Z = r\ ■ ■ ■ r n be the sequence of ¥L-rules which can be applied on G and reduce it. 
Necessarily, 1Z can be written TZ'r Q r 9 with r Q and r» two ¥LR\-rules merging respectively si 
and <£. 

Proof We show this lemma by induction on the number of vertices of the graph. It is 
obvious that if \X\ = 3 then, the only possible graphs are the following ones: 




and then, for the first one 1Z = r Q r, with k = A in r Q and k = X' in r«. For the second one 
x is e-equivalent and 1Z = rr a r 9 with r a ]Ki?3-rule such that a = 1, (3 = 1 , / = A, r = A' 
and k = A". Then, r Q and r. are KR± rules such that k = 1 for r a and r,. Suppose now 
that G has n vertices. As it is IK-reducible, there exists a sequence of IK-rules which leads 
to one of the two previous basic cases. 
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For the reduction process, we associate each vertex of G to a subexpression. We define 
E(x) to be the expression of the vertex x. At the beginning of the process, E{x) is a, the 
only letter labelling edges reaching the vertex x (homogeneity of Glushkov automata) . For 
the vertices sj and <£, we define E(sj) = E(&) = e. When applying K-rules, we associate 
a new expression to each new vertex. With notations of figure [21 the Ki?i-rule induces 
E(x) <— E(x) - k x E(y) with k = U(x, y). With notations of figure El the IKi?2-rule induces 
E(x) <— l x E{x)r x + l y E{y)r y . And with notations of figures 0] and [5l the lCi?3-rule induces 
E(x) <- lF(x)r + k. 

Proposition 10 Let G = (X, U) be a K- graph without orbit. The graph G is a Glushkov 
"K-graph if and only if it is ~K-reducible. 

( =^ ) This proposition will be proved by recurrence on the length of the expression. First 
for ||i£|| = 1, we have only two proper K-expressions which are E = A and E = XaX' , for 
A, A' € K. When E = A, the Glushkov K- graph has only two vertices which are si and <3? 
and the edge is labeled with A. Then the KR\ rule can be applied. Suppose now 

that E = XaX' , then the Glushkov IK-graph of E has three vertices and is K-reducible. 
Indeed, the Ki?i-rule can be applied twice. 

Suppose now that for each proper K-expression E of length n, its Glushkov K.-graph is 
IK-reducible. We then have to show that the Glushkov K-graph of K-expressions F = E+X, 
F = E + XaX', F = XaX'- E and F = E- XaX' of length n + 1 are K-reducible. Let us denote 
by 1Z (respectively 1Z') the sequence of rules which can be applied on A^(E) (respectively 
Ak(-F))- In case \X\ > 3, 1Z = TZbr r, (respectively TZ' = VJ h r' Q r' m ). 

case F = E + A We have Pos(F) = Pos(E), First(F) = First(E), Last(F) = Last(E), Null(F) = 
Null(E) + A and \/i £ Pos(E), Follow(F,i) = Follow(E,i). Every rule which can 
be applied on A-^(E) and which does not modify the edge (s/, <£) can also be applied 
on A K {F). 

If A^(E) has only two states, then TZ = r a Ki?i-rule, and then TZ' = r' a 

rule where r' is such that k = Null(E) + A. Elsewhere, the (s/, #) edge can only be 

reduced by a KR3 rule. 

Suppose now that there is no KR3 rule modifying (s/, <£) which can be applied on 
Ak(E). Then there is a ~KR% rule r' which can be applied on A^(F) with k = X and 
then A^(F) can be reduced by TZ' = TZ^r 'r' a r',. 

Let us now suppose that n,r2, ■ ■ -r n is the subsequence of Kit^-rules of TZ which 
modify the (s/,$) edge. Necessarily, r n acts on a state x which is e-equivalent. If 
Q~(x) ^ {sj} or Q + (x) 7^ {<!>} then TZ' b = TZbr n+ \ where r n in TZ' b is modified as 
follows: x is quasi- e-equivalent with 7 = A and the rule r n+ \ is a K.R3 rule on a state 
x which is e-equivalent and k = X. Elsewhere, there is two cases to distinguish. If 
Null(E) © A = then the r n rule is no more applicable on Ak(F) (no edge between 
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sj and <I>) and the r n _i rule in 1Z' now acts on an e-equivalent vertex in A-^{F). If 
Null(E) + A ^ then r n can be applied on A-^(F) with k = k ® A. 

case F = E + AaA' If \Pos(E)\ = n, we have, Pos(F) = Pos{E) U {n + 1}, First{F) = First(E) W 
{(A, n+1)}, Last{F) = Last(E) W{(A', n+1)}, Null(F) = Null(E) and Vi G Pos(E), 
Follow(F,i) = Follow(E,i) and Follow(F,n + 1) = 0. In this case, = 7Zbrr' r' 9 
where r is a KR2 rule with a S/ = /3<j> = 1 and l y = X, r y = X' and so -Ak(-F) is 
EC-reducible. 

case F = E-XaX' If \Pos(E)\ = n, we have, Pos(F) = Pos(E) U {n + 1}, First(F) = First{E), 
Last(F) = {(A', n + 1)}, Null(F) = and Vi G Pos{E)\P{Last(E)), Follow{F,i) = 
Follow(E,i) and Vi G P(Last(E)), Follow(F,i) = Follow(E,i) tfcl {(A,n + 1)}. Let 
7*1, • • • r n be the subsequel of K-rules modifying edges reaching <£. Necessarily, n = 1 
and ri = r, (Lemma [9]). Indeed, let us suppose that n > 1 and that there exists j / i 
such that Tj is a Ki?i, Ki?2, or Ki?3-rule. Necessarily |Q~( < I > )| > 1, which contradicts 
our hypothesis. Then we have 1Z' = TZr n+ \ where r. the Ki?i-rule from a vertex x 
to $ of the sequence 1Z and labeled with k{ is modified in 1Z' as follows: k = ki <g> A. 
We have also k = X' for the rule r n +i. 

The case F = AaA' - -E is proved similarily as the previous one considering the rules modi- 
fying edges from sj (with r instead of r,). 

( ) By induction on the number of states of the reducible K-graph G = (X,U). If 
\X\ = 2, X = {s/,$} and the only K-expression E is A with A G DC. Let G' = [X\ U') 
be the Glushkov K-graph obtained from E. By construction A = U(si,&) = E(si) and 
A = Null(E), necessarily G' = G. 

We consider the property true for ranks bellow n+1 and G a K-graph partially reduced. 
Three cases can occur according to the graphic form of the partially reduced graph. Either 
we will have to apply twice the Kiii-rule or once the Kit^-rule and twice the Ki?i-rule 
if X = {si,x,&}, or we will have to apply once the Ki?2-rule and twice the Ki?i-rule if 
X = {si,x,y,<&}. For each case, we compute successively the new expressions of vertices, 
and we check that the Glushkov construction applied on the final K-expression is G. 



3.4 Several examples of use for K-rules 

For the Ki?2 rule, the first example is for transducers in (IK, ©, 0) =(S* U 0, U, •) where "•" 
denotes the concatenation operator. In this case, we can express the Ki?2 rule conditions as 
follows. For all q~ in Q~(x), a q - is the common prefix of U(q~,x) and U(q~ ,y). Likewise, 
for all q + in Q + (x), 

beta q + is the common suffix of U(q + ,x) and U(q + ,y) . 
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aba 




>1L a 



(aE(x)a + bE(y)b) 



b 6a 

The second one is in (Z/7Z[i, j, fc], ffi, ®), where j, k} are elements of the quaternions 
and © is the sum and <8> the product. In this case, K is a field. Every factorization leads 
to the result. 

2z ^ 3j ^ „~ 




. (2iE(x)l+jE(y)2Z) 



-k w 2fe 

We now give a complete example using the three rules on the (NU{+oo}, min, +) semir- 
ing. This example enlightens the reader on the problem of the quasi-epsilon equivalence. 
For this example, we will identify the vertex with its label. 





<g) 6 <g) 1 © 3 

Ki?3 rule can be applied on x with l x = 2, r x = 5 and fc = 6 



(2j5 + 6) 



0(8 1 8>0 





Ki?3 rule can be applied on y 
with ly = 0, r y = 2 and fc = 1 



Ki?i rule can be applied on (2x5 + 6) and on (0y2 + 1) 
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3 A Ki?3 rule can be applied to end the process 

Ki?2 rule can be applied on 
(2x5 + 6)(0?/2+ 1) and on z 



This example leads to a possible IK-expression such as E = ({2x5 + 6)(0y2 + 1) + 2z) + 3 

4 Glushkov K-graph with orbits 

We will now consider a graph which has at least one maximal orbit O. We extend the 
notions of strong stability and strong transversality to the US-graphs obtained from K- 
expressions in SNF. We have to give a characterization on coefficients only. The stability 
and transversality notions are rather linked. Indeed, if we consider the states of ln(0) as 
those of + then both notions amount to the transversality. Moreover, the extension of 
these notions to WFAs (K-stability - definition 1121 - and K-transversality - definition I14h . 
implies the manipulation of output and input vectors of O whose product is exactly the 
orbit matrix of O (Proposition 117ft . 

Lemma 11 Let E be a IK- expression and G^(E) its Glushkov "K-graph. Let O = (Xq, Uq) 
be a maximal orbit of Gk(E). Then E contains a closure subexpression F such that Xq = 
Pos{F). 

This lemma is a direct consequence of Lemma 4.5 in [6] and of Lemma [U 

Definition 12 (K-stability) A maximal orbit O of a ~K-graph G = (X, U) is ~K-stable if 

• O is stable and 

• the matrix M Q G K\ 0ut ^ x \ In ^ such that M a (s,e) = U(s,e), for each (s,e) of 

Out(0) x ln(0), can be written as a product VW of two vectors such that V G 
K \0ut(O)\xi and w £K ix\In(0)\_ 

The graph G is IK-stable if each of its maximal orbits is "K-stable. 

If a maximal orbit O is K-stable, Mq is a matrix of rank 1 called the orbit matrix. 
Then, for a decomposition of Mq in the product VW of two vectors, V will be called the 
tail-orbit vector of O and W will be called the head-orbit vector of O. 
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Lemma 13 A Glushkov ¥L-graph obtained from a "^-expression E in SNF is 'K-stable. 

Proof Let G be the Glushkov IK-graph of a K-expression E in SNF, Aj^(E) = (£, Q, sj, F, 8) 
its Glushkov WFA and O = (Xq,Uo) be a maximal orbit of G. Following Lemma |4] 
and Theorem [5l G is strongly stable which implies that every orbit of G is stable. Let 
Si G Out(O), 1 < % < \Out(0)\ and ej G In(O), 1 < j < Out(O). Following the ex- 
tended Glushkov construction and as for all Sj G Out(O), Sj 7^ sj, we have 5(si,a, ej) = 
CoeSF ii ow ^E, Si )( e j)- As O corresponds to a closure subexpression F* or F + (Lemma fTTT) 
and as (si,a,ej) is an edge of x £ x Xq, we have <5(sj, a, ej) = CoeSp u ow ^F* ^ Si ^{ej) = 

Coeff ™i OU ;(F, Sl ;)wCoeff Laat(F) ( Sl ).Rr,i(F)( e i)- As E is in SNF » SO are F * and F+ > and then 

5(si,a,ej) = CoeS CoeS {si) ) = Coeff Lasi(F) (s i ).Coeff F i r . st(F )(e i ). The lemma 

is proved choosing V G K^ 0ut ^ xl such that V{i, 1) = Coeff Last(F) (si) and W G K lx l /n (°)l 
with = Coeff First(F )(e : ,). 

■ 

Definition 14 (K-transversality) ^4 maximal orbit O of G = (X, U) is "K-transverse if 

• O is transverse, 

• the matrix M e G K^°'^ In ^ such that M e (p,e) = U{p,e) for each (p,e) of 0~ x 
In((D), can be written as a product ZT of two vectors such that Z G M)° ' x1 and 

• the matrix M s G K) 0ut (°^ o+ \ such that M s (s,q) = U(s,q) for each (s,q) of 
Out(0) x + , can be written as a product T'Z' of two vectors such that T' G 
K \0ut(O)\xi and Z' £K lx \° + \. 

The graph G is ^.-transverse if each of its maximal orbits is ¥L-transverse. 

If a maximal orbit O is K-transverse, M e (respectively M s ) is a matrix of rank 1 
called the input matrix of O (respectively output matrix of O). For a decomposition of M e 
(respectively M s ) in the product ZT (respectively T'Z') of two vectors, T will be called 
the input vector (respectively T' will be called the output vector) of O. 

Lemma 15 The Glushkov K-graph G = (X, U) of a K-expression E in normal form is 
M,-transverse. 

Proof Let O be a maximal orbit of G. Following Lemma [4] and Theorem (H G is strongly 
transverse implies that O is transverse. By Lemma [TT| there exists a maximal closure 
subexpression H such that H = F* or H = F + . As E is in normal form, so is H. By the 
definition of the function Follow, we have in this case: for all p G 0ut(O), for all q G + , 
U{p,q) = Coe& Fo n ow ( Fp }(q). We now have to distinguish three cases. 
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1. If \0 + \ = 1, then the result holds immediatly. Indeed the output matrix of O is a 
vector. 

2. If + = {<7i,--- ,q n } and n > 1, VI < j < n,qj ^ <£, necessarily, we have 

+ = P(First(Hi)) with some subexpressions of E. Then we have U(p,qj) = 
i 

Coe %!oeff w(J?) (p).Fir<ffj)(^) if Qj G p ( First ( H i))- Then as ?j is a first position of 
only one subexpression, U(p,qj) = k p ® CoeS First ^ Hl ^{qj) where k p = Coe& Last ^(p) 
which concludes this case. 

3. Now if 31 < j < n | qj = $ then U(p, qj) = CoeS iast(F)(p) ® k where k is the Null 
value of some subexpression following F not depending on p. 

A same reasoning can be used for the left part of the transversality. 

■ 

Definition 16 (K-balanced) The orbit O of a graph G is ^.-balanced if G is ~K-stable 
and "K-transverse and if there exists an input vector T of O and an output vector T' of O 
such that the orbit matrix Mq = T'T. 

Proposition 17 A Glushkov M>-graph obtained from a "K-expression E in SNF is K- 
balanced. 

Proof Lemma [13] enlightens on the fact that V, the tail orbit vector of 0, is such that 
V(i, 1) = Coeff iast ( F ) (i) for all i G P(Last(F)), which is, from Lemma [TBI the output 
vector of O. The details of the proofs for these lemmas show in the same way that there 
exists an head-orbit vector and an input vector for O which are equal. 

■ 

We can now define the recursive version of WFA K-balanced property. 

Definition 18 A "K-graph is strongly "K-balanced if (1) it has no orbit or (2) it is K- 
balanced and if after deleting all edges Out(0) x ln{0) of each maximal orbit O, it is 
strongly "K-balanced. 

Proposition 19 A Glushkov "K-graph obtained from a "K-expression E in SNF is strongly 
^-balanced. 

Proof Let G be the Glushkov of a IK-expression E and O be a maximal orbit of G. The 
Glushkov K-graph G is strongly stable and strongly transverse. As E is in SNF, edges of 
0ut{O) x ln{0) that are deleted are backward edges of a unique closure subexpression F* 
or F + . Consequently, the recursive process of edges removal deduced from the definition 
of strong K-stability produces only maximal orbits which are K-balanced. The orbit O is 
therefore strongly IC-balanced. 
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Theorem 20 Let G = (X,U). G is a Glushkov K-graph of a K-expression E in SNF if 
and only if 

• G is strongly K-balanced. 

• The graph without orbit of G is K-reducible. 

Proof Let G = (X, U) be a Glushkov K-graph. From Proposition [T9l G is strongly K- 
balanced. The graph without orbit of G is K-reducible (Proposition [T0|) For the converse 
part of the theorem, if G has no orbit and G is K-reducible, by Proposition [10] the result 
holds immediatly. Let O be a maximal orbit of G. As it is strongly K-balanced, we can 
write Mo = VW the orbit matrix of O, there exists an output vector T' equal to the 
tail-orbit vector V and an input vector T equal to the head-orbit vector W. If the graph 
without orbit of O corresponds to a K-expression F then O corresponds to the K-expression 
F+ where CoeS First(F+) (i) = W(l,i),Vi G P(First(F+)), Coe& Last(F+) (j) = V(j,l),Vj G 
P{Last(F+)). We have also CoeS Follow{F+ij) (i) = Coe$ Follmv{Fj)mCoeSLast(F) , First{F) (i) , 
Vj G P(Last(F)) and Vi G P(First(F)). Hence the Glushkov functions are well defined. 

We now have to show that the graph without orbit of O can be reduced to a single 
vertex. By the successive applications of the K-rules, the vertices of the graph without 
orbit of O can be reduced to a single state (giving a K-rational expression for O). Indeed, 
as O is transverse, no K-rule concerning one vertex of O and one vertex out of O can be 
applied. 



5 Algorithm for orbit reduction 

In this section, we present arecursive algorithm that computes a K-expression from a 
Glushkov K-graph. We then give an example which illustrate this method. 



2(3 



Algorithms 



OrbitReduction (G) 

> Input: A K-graph G = (X, U) 

> Output: A newly computed graph without orbit 
1 Begin 



2 for each maximal orbit O = (Xq , Uq ) of G do 

3 if BackEdgesRemoval(C, T, T\ Z, Z') then 

4 if OrbitReduction (0) then 

5 if Expression (E a , O, T, T') then 

6 ReplaceStates(G,C, j B c1 ,Z,Z / ) 

7 else return False 

8 else return False 

9 else return False 
10 return True 



11 End 



The BackEdgesRemoval function on O deletes edges from Out(0) to ln(0), returns 
true if vectors T, T', Z, Z' (as defined in definition I14p can be computed, false otherwise. 

The Expression function returns true, computes the K-expression E of G' = (OU 
{a/, where C/' ^ C/U{( S/ , T(l, j), e,) | G /n(C)}U{( Si , T'(i, 1), $) | 8j G 0ut(O)} 

and ouputs £Jo <— if O is K-reducible. It returns false otherwise. 

The ReplaceStates function replaces O by one state x labeled Eq and connected to 
0~ and + with the sets of coefficients of Z and Z' . Formally G = {X \ Xq U {x}, U) 
with U <-U\{(u,k,v) \u,vE 0}U{( Pj ,Z(j,l),x) \ P j G 0-}U{(x,Z'(l,i), gi ) \ Qi G 0+}. 
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BackEdgesRemoval(C, T, T\ Z, Z') 

> Input: a K-graph O = (X , U ), M e G Kl°~l x l /n ( )l 

> Input: M s G Kl 0«<(O)|x|O+| jMo g K |0«<(O)|x|/n(O)| 

> Output: T G K lx l /rj (°)l, T' G k\ 0u ^\ x1 ,Z G Kl°~l xl ,Z' G K 1x \° + \ 

1 Begin 

2 for each line I of M e do 

3 gcdj(Z) <— left GCD of all values of the line / 

> gcd; is the vector of gcd^(Z) values 

4 Find a vector gcd; such that M e = gcd ; ® gcd ; 

5 if gcd^ does not exist then 

6 return False 

7 for each column c of M s do 

8 gcd r (c) <— RIGHT GCD of all values of the column c 

> gcd r is the vector of gcd r (c) values 

9 Find a vector gcd r such that M s = gcd r <8) gcd r 
70 if gcd r does not exist then 

77 return False 

12 Find k such that = gcd r <8> <8> gcd ; 

75 if does not exist then 
7^ return False 

15 A <— right GCD of all values of the gcd z vector 

76 73 <— left GCD of all values of the gcd r vector 

77 k\ < — left GCd(7?,/c) 

18 Find ki such that k = k\®ki 

19 if right GCD(/c 2 ,^) / ^2 then 

20 return False 

21 T <— fc 2 0gcd7 

22 T' <- gcd r (g) fci 

23 Find Z such that gcd ; = Z ®k 2 

24 Find Z' such that gcd r = k\® Z' 

25 delete any edge from Out(O) to ln(0) 

26 return True 

27 End 



Illustrated example 

We illustrate Glushkov WFAs characteristics developped in this paper with a reduction 
example in the (N U {+oo}, min, +) semiring. This example deals with the reduction of 
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an orbit and its connection to the outside. We first reduce the orbit to one state and 
replace the orbit by this state in the original graph. This new state is then linked to the 
predecessors (respectively successors) of the orbit with vector Z (respectively Z') as label 
of edges. 

Let G be the K-subgraph of Figure [6] and let O be the only maximal orbit of G such 
that X = {ai, b 2 , c 3 , a 4 , h, 6 6 , c 7 }. 




VW. We easily check that the orbit is EC-balanced. There is an input vector T which is 
equal to W and an output vector T' which is equal to V. 



Then, we delete back edges and add sj and $ vertices for the orbit O. The sj vertex 
is connected to ln{0). Labels of edges are values of the T vector. Every vertex of Out(0) 
is connected to Labels of edges are values of the T 1 vector. The following graph is then 
reduced to one state by iterated applications of K-rules. 




tively + ) are connected to the newly computed state choosing Z as vector of coefficients 
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(respectively Z'). 



©-2 

(Br 1 " 



((2a + 63 + c2)-a-6- (46 + 5c2)) 




6 Conclusion 

While trying to characterize Glushkov K-graph, we have pointed out an error in the paper 
by Caron and Ziadi [6] that we have corrected. This patching allowed us to extend char- 
acterization to K-graph restricting K to factorial semirings or fields. For fields, conditions 
of applications of K-rules are sufficient to have an algorithm. 

For the case of strict semirings, this limitation allowed us to work with GCD and then 
to give algorithms of computation of K-expressions from Glushkov IK- graphs. 

This characterization is divided into two main parts. The first one is the reduction of 
an acyclic Glushkov K-graph into one single vertex labeled with the whole K-expression. 
We can be sure that this algorithm ends without doing a depth first search according to 
confluence of K-rules. The second one is lying on orbit properties. These criterions allow 
us to give an algorithm computing a single vertex from each orbit. 

In case the expression is not in SNF or the semiring is not zero-divisor free, some edges 
are computed in several times (coefficients are ©-added) which implies that some edges 
may be deleted. Then this characterization does not hold. A question then arises: the 
factorial condition is a sufficient condition to have an algorithm. Is it also a necessary 
condition ? 
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